Review of "Introduction to clustering large and high-dimensional data" by J. Kogan
نویسنده
چکیده
Roughly speaking, clustering is a data analysis task to group a set of items into different categories so that items within one category are similar and items between different categories are dissimilar, where similar and dissimilar depend on the definition of distance between items. Although known for many decades, recently clustering has gained a lot of importance due to the exponential growth of digital libraries and the World Wide Web and the thus resulting need to find and extract information. Motivated by these Information Retrieval (IR) applications, which are usually characterized by large, sparse and high-dimensional data, “Introduction to Clustering Large and High-Dimensional Data” by J. Kogan is a textbook that tries to focus on a few clustering techniques that are very common in IR. In particular, it focuses on the kmeans algorithm, which is by far the most popular one in IR, including many of its variations, among them incremental kmeans, spherical k-means, quadratic k-means, k-means with divergences and others.
منابع مشابه
High-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملAcetylation of wood – A review
Wood is a porous three dimensional, hydroscopic, viscoelastic, anisotropic bio-polymer composite composed of an interconnecting matrix of cellulose, hemicelluloses and lignin with minor amounts of inorganic elements and organic extractives. Some, but not all, of the cell wall polymer hydroxyl groups are accessible to moisture and these accessible hydroxyls form hydrogen bonds with water. As the...
متن کاملApplication of modified balanced iterative reducing and clustering using hierarchies algorithm in parceling of brain performance using fMRI data
Introduction: Clustering of human brain is a very useful tool for diagnosis, treatment, and tracking of brain tumors. There are several methods in this category in order to do this. In this study, modified balanced iterative reducing and clustering using hierarchies (m-BIRCH) was introduced for brain activation clustering. This algorithm has an appropriate speed and good scalability in dealing ...
متن کاملA Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computer Science Review
دوره 2 شماره
صفحات -
تاریخ انتشار 2008